spatialHeatmap 0.99.0
[Comment-Thomas: note, to-dos are now mostly listed in margin notes on right.]
The spatialHeatmap package provides functionalities for visualizing cell-, tissue- and organ-specific data of biological assays by coloring the corresponding spatial features defined in anatomical images according to a numeric color key. The color scheme used to represent the assay values can be customized by the user. This core functionality of the package is called a spatial heatmap (SHM) plot. It is enhanced with nearest neighbor visualization tools for groups of measured items (e.g. gene modules) sharing related abundance profiles, including matrix heatmaps combined with hierarchical clustering dendrograms and network representations. The functionalities of spatialHeatmap can be used either in a command-driven mode from within R or a graphical user interface (GUI) provided by a Shiny App that is also part of this package. While the R-based mode provides flexibility to customize and automate analysis routines, the Shiny App includes a variety of convenience features that will appeal to biologists. Moreover, the Shiny App can be used on both local computers as well as centralized server-based deployments (e.g. cloud-based or custom servers) that can be accessed remotely as a public web service for using spatialHeatmap’s functionalities with community and/or private data. The functionalities of the spatialHeatmap package are illustrated in Figure 1.
Figure 1: Functionality overview
The numeric data can come as a vector, data frame, or SummarizedExperiment (SE). If vector or data frame, the sample and condition identifiers should be in the form of ’sample__condition’, e.g. ’S1__con1’. If data frame or SE, the columns and rows should be sample/conditions and assayed items (gene1, gene2) respectively. In SE, the colData slot is required and contains replicate information, while the rowData slot is optional and contains row item annotation. If the latter is available, the annotation is seen by mousing over a node in the interactive network. In the aSVG image (see aSVG below), spatial features are pre-defined and assigned unique identifiers. In visualization, only aSVG features having identical sample counterparts in data are colored (e.g. S1) in SHMs. To supplement SHMs, coexpression analysis is applied on ‘data matrix’ to identify network modules. The gene in SHMs can be investigated in the gene module it belongs to, where the module is in form of matrix heatmap and network. Lastly, the SHMs, matrix heatmap, network are all combined as an interactive Shiny app.
As anatomical images the package supports both tissue maps from public repositories and custom images provided by the user. In general any type of image can be used as long as it can be provided in SVG (Scalable Vector Graphics) format, where the corresponding spatial features have been defined (see aSVG below). The numeric values plotted onto a SHM are usually quantitative measurements from a wide range of profiling technologies, such as microarrays, next generation sequencing (e.g. RNA-Seq and scRNA-Seq), proteomics, metabolomics, or many other small- or large-scale experiments. For convenience, several preprocessing and normalization methods for the most common use cases are included that support raw and/or preprocessed data. Currently, the main application domains of the spatialHeatmap package are numeric data sets and spatially mapped images from biological, agricultural and biomedical areas. Moreover, the package has been designed to also work with many other spatial data types, such a population data plotted onto geographic maps. This high level of flexibility is one of the unique features of spatialHeatmap. Related software tools for biological applications in this field are largely based on pure web applications (Winter et al. 2007; Waese et al. 2017) or local tools (Maag 2018; Muschelli, Sweeney, and Crainiceanu 2014) that typically lack customization functionalities. These restrictions limit users to utilizing pre-existing expression data and/or fixed sets of anatomical image collections. To close this gap for biological use cases, we have developed spatialHeatmap as a generic R/Bioconductor package for plotting quantitative values onto any type of spatially mapped images in a programmable environment and/or in an intuitive to use GUI application.
The core feature of spatialHeatmap is to map the assay values (e.g. gene expression data) of one or many items (e.g. genes) measured under different conditions in form of numerically graded colors onto the corresponding cell types or tissues represented in a chosen SVG image. In the gene profiling field, this feature supports comparisons of the expression values among multiple genes by plotting their SHMs next to each other. Similarly, one can display the expression values of a single or multiple genes across multiple conditions in the same plot (Figure 3). This level of flexibility is very efficient for visualizing complicated expression patterns across genes, cell types and conditions. In case of more complex anatomical images composed of overlapping multiple layer tissues, it is important to visually expose the tissue layer of interest in the plots. To address this, several default and customizable layer viewing options are provided. They allow to hide features in the top layers by making them transparent in order to expose features below them. This transparency viewing feature is highlighted below in the mouse example (Figure 6).
To maximize reusability and extensibility, the package organizes large-scale omics assay data along with the associated experimental design information in a SummarizedExperiment object. The latter is one of the core S4 classes within the Bioconductor ecosystem that has been widely adapted by many other software packages dealing with gene-, protein- and metabolite-level profiling data (Morgan et al. 2018). In case of gene expression data, the assays slot of the SummarizedExperiment container is populated with a gene expression matrix, where the rows and columns represent the genes and tissue/conditions, respectively, while the colData slot contains replicate information. The tissues and/or cell type information in the object maps via colData to the corresponding features in the SVG images using unique identifiers for the spatial features (e.g. tissues or cell types). This allows to color the features of interest in an SVG image according to the numeric data stored in a SummarizedExperiment object. For simplicity the numeric data can also be provided as numeric vectors or data.frames. This can be useful for testing purposes and/or the usage of simple data sets that may not require the more advanced features of the SummarizedExperiment class, such as measurements with only one or a few data points. Details about how to access the SVG images and properly format the associated expression data are provided in the Supplement section of this vignette.
SHMs are images where colors encode numeric values in features of any shape. For plotting SHMs, Scalable Vector Graphics (SVG) has been chosen as image format since it is a flexible and widely adapted vector graphics format that provides many advantages for computationally embedding numerical and other information in images. SVG is based on XML formatted text describing all components present in images, including lines, shapes and colors. In case of biological images suitable for SHMs, the shapes often represent anatomical or cell structures. To assign colors to specific features in SHMs, annotated SVG (aSVG) files are used where the shapes of interest are labeled according to certain conventions so that they can be addressed and colored programmatically. SVGs and aSVGs of anatomical structures can be downloaded from many sources including the repositories described below. Alternatively, users can generate them themselves with vector graphics software such as Inkscape. Typically, in aSVGs one or more shapes of a feature of interest, such as the cell shapes of an organ, are grouped together by a common feature identifier. Via these group identifiers one or many feature types can be colored simultaneously in an aSVG according to biological experiments assaying the corresponding feature types with the required spatial resolution. Correct assignment of image features and assay results is assured by using for both the same feature identifiers. The color gradient used to visually represent the numeric assay values is controlled by a color gradient parameter. To visually interpret the meaning of the colors, the corresponding color key is included in the SHM plots. Additional details for properly formatting and annotating both aSVG images and assay data are provided in the Supplement section of this vignette.
If not generated by the user, SHMs can be generated with data downloaded from various public repositories. This includes gene, protein and metabolic profiling data from databases, such as GEO, BAR and EBI. A particularly useful resource, when working with spatialHeatmap, is the Expression Atlas from EMBL-EBI (Papatheodorou et al. 2018). This online service contains both assay and anatomical images. Its assay data include mRNA and protein profiling experiments for different species, tissues and conditions. The corresponding anatomical image collections are also provided for a wide range of species including animals and plants. In spatialHeatmap several import functions are provided to work with the expression and aSVG repository from the Expression Atlas directly. The aSVG images developed by the spatialHeatmap project are available in its own repository called spatialHeatmap_aSVGs, where users can contribute their aSVG images that are formatted according to our guidlines.1 GirkeLab in the name of this repos is a poor choice. Perhaps just call it spatialHeatmap_aSVGs.
The following sections of this vignette showcase the most important functionalities of the spatialHeatmap package using as initial example a simple to understand toy data set, and then more complex mRNA profiling data from the Expression Atlas and GEO databases. First, SHM plots are generated for both the toy and mRNA expression data. The latter include gene expression data sets from RNA-Seq and microarray experiments of Human Brain, Mouse Organs, Chicken Organs, and Arabidopsis Shoots. The first three are RNA-Seq data from the Expression Atlas, while the last one is a microarray data set from GEO. Second, gene context analysis tools are introduced, which facilitate the visualization of gene modules sharing similar expression patterns. This includes the visualization of hierarchical clustering results with traditional matrix heatmaps (Matrix Heatmap) as well co-expression network plots (Network). Third, an overview of the corresponding Shiny App is presented that provides access to the same functionalities as the R functions, but executes them in an interactive GUI environment (Chang et al., n.d.; Chang and Borges Ribeiro 2018). Fourth, more advanced features for plotting customized SHMs are covered using the Human Brain data set as an example.
The spatialHeatmap package should be installed from an R (version \(\ge\) 3.6) session with the BiocManager::install command.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("spatialHeatmap")
Next, the packages required for running the sample code in this vignette need to be loaded.
library(spatialHeatmap); library(SummarizedExperiment); library(ExpressionAtlas); library(GEOquery)
The following lists the vignette(s) of this package in an HTML browser. Clicking the corresponding name will open this vignette.
browseVignettes('spatialHeatmap')
SHMs are plotted with the spatial_hm function. To provide a quick and transparent overview how these plots are generated, the following uses a generalized toy example where a small vector of random numeric values is generated that are used to color features in an aSVG image. The image chosen for this example is an aSVG depicting the human brain. The corresponding image file ‘homo_sapiens.brain.svg’ is included in this package for testing purposes. The path to this image on a user's system, where spatialHeatmap is installed, can be obtained with the system.file function.
The following commands obtain the directory of the aSVG collection and the full path to the chosen target aSVG image on a user’s system, respectively.
svg.dir <- system.file("extdata/shinyApp/example", package="spatialHeatmap")
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap")
To identify features of interest in annotated aSVG images, the return_feature function can be used. The following searches the aSVG images stored in dir for the query terms ‘lobe’ and ‘homo sapiens’ under the feature and species fields, respectively. The identified matches are returned as a data.frame.
feature.df <- return_feature(feature=c('lobe'), species=c('homo sapiens'), remote=FALSE, dir=svg.dir)
## Accessing features...
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
feature.df
## feature id SVG parent index index1
## 1 occipital lobe UBERON_0002021 homo_sapiens.brain.svg LAYER_EFO 9 7
## 2 parietal lobe UBERON_0001872 homo_sapiens.brain.svg LAYER_EFO 10 8
## 3 temporal lobe UBERON_0001871 homo_sapiens.brain.svg LAYER_EFO 26 24
fnames <- feature.df[, 1]
The following example generates a small numeric toy vector, where the data slot contains four numbers and its name slot is populated with the three feature names obtained from the above aSVG image. In addition, a non-matching entry (here ‘notMapped’) is included for demonstration purposes. Note, the numbers are mapped to features via name matching among the numeric vector and the aSVG, respectively. Accordingly, only numbers and features with matching name counterparts can be colored in the aSVG image. Entries without name matches are indicated by a message printed to the R console, here “notMapped”. This behavior can be turned off with verbose=FALSE in the corresponding function call. In addition, a summary of the numeric assay to feature mappings is stored in the result data.frame returned by the spatial_hm function (see below).
my_vec <- sample(1:100, length(unique(fnames))+1)
names(my_vec) <- c(unique(fnames), 'notMapped')
my_vec
## occipital lobe parietal lobe temporal lobe notMapped
## 72 71 4 9
Next, the SHM is plotted with the spatial_hm function (Figure 2). Internally, the numbers in my_vec are translated to colors based on the color key assigned to the col.com argument, and then painted onto the corresponding features in the aSVG, where the path to the image file is defined by svg.path=svg.hum. The remaining arguments used here include: ID for defining the title of the plot; ncol for setting the column-wise layout of the plot excluding the feature legend plot on the right; and height for defining the height of the SHM relative to its width. In the given example (Figure 2) only three features in my_vec (‘occipital lobe’, ‘parietal lobe’, and ‘temporal lobe’) have matching entries in the corresponding aSVG.
shm.df <- spatial_hm(svg.path=svg.hum, data=my_vec, ID='toy', ncol=1, height=0.7, sub.title.size=20)
## Enrties not mapped: notMapped
Figure 2: SHM of human brain with toy data
The plots from left to right represent color key, SHM and legend. The colors in the first two plots depict the user provided numeric values, whereas in the legend plot they are used to map the feature labels to the corresponding spatial regions in the image.
The named numeric values in my_vec, that have name matches with the features in the chosen aSVG, are stored in the mapped_feature slot.2 The numeric values range from 9-26, while the color key on the left has a range from 0-2500. This is confusing and should be corrected.
# SHMs and mapped features are stored in a list.
names(shm.df)
## [1] "spatial_heatmap" "mapped_feature"
# Mapped features
shm.df[['mapped_feature']]
## rowID featureSVG value
## 1 toy occipital.lobe 72
## 2 toy parietal.lobe 71
## 3 toy temporal.lobe 4
This subsection introduces how to find cell- and tissue-specific assay data in the Expression Atlas database. After choosing a gene expression experiment, the data is downloaded directly into a user's R session. Subsequently, the expression values for selected genes can be plotted onto a chosen aSVG image with or without prior preprocessing steps (e.g. normalization). For querying and downloading expression data from the Expression Atlas database, functions from the ExpressionAtlas package are used (Keays 2019).
The following example searches the Expression Atlas for expression data derived from specific tissues and species of interest, here ‘cerebellum’ and ‘Homo sapiens’, respectively.
all.hum <- searchAtlasExperiments(properties="cerebellum", species="Homo sapiens")
The search result is stored in a DFrame containing accessions matching the above query. For the following sample code, the accession ‘E-GEOD-67196’ from Prudencio et al. (2015) has been chosen, which corresponds to an RNA-Seq profiling experiment of ‘cerebellum’ and ‘frontal cortex’ brain tissue from patients with amyotrophic lateral sclerosis (ALS). Details about the corresponding record can be returned as follows.
all.hum[2, ]
## DataFrame with 1 row and 4 columns
## Accession Species Type
## <character> <character> <character>
## 1 E-GEOD-67196 Homo sapiens RNA-seq of coding RNA
## Title
## <character>
## 1 Transcription profiling by high throughput sequencing of cerebellum and frontal cortex from patients of amyotrophic lateral sclerosis
The getAtlasData function allows to download the chosen RNA-Seq experiment from the Expression Atlas and import it into a RangedSummarizedExperiment object of a user's R session.
rse.hum <- getAtlasData('E-GEOD-67196')[[1]][[1]]
## Downloading Expression Atlas experiment summary from:
## ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-GEOD-67196/E-GEOD-67196-atlasExperimentSummary.Rdata
## Successfully downloaded experiment summary object for E-GEOD-67196
The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.hum. The following returns only its first five rows and columns.
colData(rse.hum)[1:5, 1:5]
## DataFrame with 5 rows and 5 columns
## AtlasAssayGroup organism individual organism_part
## <character> <character> <character> <character>
## SRR1927019 g1 Homo sapiens individual1 cerebellum
## SRR1927020 g2 Homo sapiens individual1 frontal cortex
## SRR1927021 g1 Homo sapiens individual2 cerebellum
## SRR1927022 g2 Homo sapiens individual2 frontal cortex
## SRR1927023 g1 Homo sapiens individual34 cerebellum
## disease
## <character>
## SRR1927019 amyotrophic lateral sclerosis
## SRR1927020 amyotrophic lateral sclerosis
## SRR1927021 amyotrophic lateral sclerosis
## SRR1927022 amyotrophic lateral sclerosis
## SRR1927023 amyotrophic lateral sclerosis
The following example shows how to download from the EBI SVG repository an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded in the previous subsection. The return_feature function queries the repository for feature- and species-related keywords, here c('frontal cortex', 'cerebellum') and c('homo sapiens', 'brain'), respectively. To return aSVGs with at least one feature and one species match, the argument keywords.any is set to TRUE by default. When return.all=FALSE, only aSVGs matching the query keywords are returned and saved under dir. Otherwise, all aSVGs are returned regardless of the keywords. To avoid overwriting of existing SVG files, it is recommended to start with an empty target directory, here ~/test. To search a local directory for matching aSVG images, the argument setting remote=FALSE needs to be used, while specifying the path of the corresponding directory under dir. All or only matching features are returned if match.only is set to FALSE or TRUE, respectively.
dir.create('~/test') # Create empty directory
feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=TRUE, desc=FALSE) # Query aSVGs
feature.df[1:8, ] # Return first 8 rows for checking
unique(feature.df$SVG) # Return all matching aSVGs
To build this vignettes according to the R/Bioconductor package requirements, the following code section uses the aSVG file instance included in the spatialHeatmap package rather than the downloaded instance from the previous example.
feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE)
## Accessing features...
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
Note, the target tissues frontal cortex and cerebellum are included in both the experimental design slot of the downloaded expression data as well as the annotations of the aSVG. This way these features can be colored in the downstream SHM plots. If necessary users can also change the feature identifiers and names in an aSVG. Details on this utility are provided in the Supplement.
feature.df
## feature id SVG parent index
## 1 middle frontal gyrus UBERON_0002702 homo_sapiens.brain.svg LAYER_EFO 8
## 2 cingulate cortex UBERON_0003027 homo_sapiens.brain.svg LAYER_EFO 21
## 3 prefrontal cortex UBERON_0000451 homo_sapiens.brain.svg LAYER_EFO 23
## 4 frontal cortex UBERON_0001870 homo_sapiens.brain.svg LAYER_EFO 24
## 5 cerebral cortex UBERON_0000956 homo_sapiens.brain.svg LAYER_EFO 25
## 6 cerebellum UBERON_0002037 homo_sapiens.brain.svg LAYER_EFO 27
## index1
## 1 6
## 2 19
## 3 21
## 4 22
## 5 23
## 6 25
Since the Expression Atlas supports the cross-species anatomy ontology, the corresponding UBERON identifiers are included in the id column of the data.frame returned by the above function call of return_feature (Mungall et al. 2012). This ontology is also supported by the rols Bioconductor package (Gatto 2019).
For organizing experimental designs and downstream plotting purposes it can be desirable to shorten the text in certain columns of colData. This way one can use the source data for including ‘pretty’ sample names in columns and legends of all downstream tables and plots, respectively, in a consistent and automated manner. To achieve this, the following example imports a ‘targets’ file that can be generated and edited by the user in a text or spreadsheet program. In the following example the targets file is used to replace the text in the colData slot with the shortened version.
Import of custom target file defining simplified sample labels and experimental design.
hum.tar <- system.file('extdata/shinyApp/example/target_human.txt', package='spatialHeatmap')
target.hum <- read.table(hum.tar, header=TRUE, row.names=1, sep='\t')
Load custom target data into colData slot.
colData(rse.hum) <- DataFrame(target.hum)
A slice of the simplified colData object is shown below, where the disease column contains now shorter labels than in the original data set. Additional details for generating and using targets files in spatialHeatmap are provided in the Supplement of this vignette.
colData(rse.hum)[c(1:3, 41:42), 4:5]
## DataFrame with 5 rows and 2 columns
## organism_part disease
## <character> <character>
## SRR1927019 cerebellum ALS
## SRR1927020 frontal cortex ALS
## SRR1927021 cerebellum ALS
## SRR1927059 cerebellum normal
## SRR1927060 frontal cortex normal
The actual gene expression data of the downloaded RNA-Seq experiment is stored in the assay slot of rse.hum. Since it contains raw count data, it is often desirable to apply basic preprocessing routines prior to plotting spatical heatmaps. These preprocessing steps are optional, and thus can be skipped if needed. The following shows how to normalize the count data, aggregate replicates and then remove genes with unreliable expression responses. These preprocessing steps are optional, and thus can be skipped as needed.3 I think one should assure that skipping the preprocessing works seemlessly even without aggregating replicates by letting the user choose which replicate to use. Is this possible yet? This is also important when users start with their fully preprocessed data which I assume will be a very common situation.
For normalizing raw count data from RNA-Seq experiments, the norm_data can be used. It supports the following pre-existing functions from widely used packages for analyzing count data in the next generation sequencing (NGS) field: calcNormFactors (CNF) from edgeR (McCarthy et al. 2012); as well as estimateSizeFactors (EST), varianceStabilizingTransformation (VST), and rlog from DESeq2 (Love, Huber, and Anders 2014). The argument norm.fun specifies one of the four internal normalizing functions: CNF, EST, VST, and rlog. If norm.fun='none', no normalization is applied. The arguments for each normalizing function are provided via parameter.list, which is a list with named slots. For example, norm.fun='ESF' and parameter.list=list(type='ratio') is equivalent to estimateSizeFactors(object, type='ratio').
If paramter.list=NULL, the default arguments are used by the normalizing function assigned to norm.fun. For additional details, users want to consult the help file of the norm_data function by typing ?norm_data in the R console.
The following example uses the ESF normalization option. This method has been chosen mainly due to its good time performance.
se.nor.hum <- norm_data(data=rse.hum, norm.fun='ESF', data.trans='log2')
## Normalising: ESF
## type
## "ratio"
Replicates are aggregated with the aggr_rep function, where the summary statistics can be chosen under the aggr argument (e.g. aggr='mean'). The columns specifying replicates can be assigned to the sam.factor and con.factor arguments corresponding to samples and conditions, respectively. For tracking, the corresponding sample/condition labels are used as column titles in the aggregated assay instance, where they are concatenated with a double underscore as separator. In addition, the corresponding rows in the colData slot are collapsed accordingly.
se.aggr.hum <- aggr_rep(data=se.nor.hum, sam.factor='organism_part', con.factor='disease', aggr='mean')
assay(se.aggr.hum)[1:3, ]
## cerebellum__ALS frontal.cortex__ALS cerebellum__normal
## ENSG00000000003 7.024054 7.091484 6.406157
## ENSG00000000005 0.000000 1.540214 0.000000
## ENSG00000000419 7.866582 8.002549 8.073264
## frontal.cortex__normal
## ENSG00000000003 7.004446
## ENSG00000000005 1.403110
## ENSG00000000419 7.955709
To remove unreliable expression measures, filtering can be applied. The following example eliminates genes with expression values larger than 5 (log2 space) in at least 1% of all samples (pOA=c(0.01, 5)), while retaining genes with a coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)).
se.fil.hum <- filter_data(data=se.aggr.hum, sam.factor='organism_part', con.factor='disease', pOA=c(0.01, 5), CV=c(0.3, 100), dir=NULL)
To inspect the results, the following returns three selected rows of the fully preprocessed data matrix (Table 1).
assay(se.fil.hum)[c(5, 733:734), ]
| cerebellum__ALS | frontal.cortex__ALS | cerebellum__normal | frontal.cortex__normal | |
|---|---|---|---|---|
| ENSG00000268433 | 5.324064 | 0.3419665 | 3.478074 | 0.1340332 |
| ENSG00000268555 | 5.954572 | 2.6148548 | 4.934974 | 2.0351776 |
| ENSG00000269113 | 7.544417 | 1.7425299 | 6.808402 | 0.9694065 |
The preprocessed expression values for any gene in the assay slot of se.fil.hum can be plotted as a SHM. The following uses gene ENSG00000268433 as an example. The chosen aSVG is a depiction of the human brain where the assayed featured are colored by the corresponding expression values in se.fil.hum.
shm.df <- spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433'), height=0.6, legend.r=1.3)
Figure 3: SHM of human brain
Only cerebellum and frontal cortex are colored, because they are present in both the aSVG and the expression data. The legend plot on the right maps the feature labels to the corresponding spatial regions in the image.
The plotting instructions of the SHM along with the corresponding mapped features are stored in a list named shm.df. Its components can be accessed as follows.
names(shm.df)
## [1] "spatial_heatmap" "mapped_feature"
# Mapped features
shm.df[['mapped_feature']]
## rowID featureSVG condition value
## 1 ENSG00000268433 cerebellum ALS 5.3240638
## 2 ENSG00000268433 frontal.cortex ALS 0.3419665
## 3 ENSG00000268433 cerebellum normal 3.4780744
## 4 ENSG00000268433 frontal.cortex normal 0.1340332
In the above example, the normalized expression values of gene ENSG00000268433 are colored in the frontal cortex and cerebellum, while the different conditions, here normal and ALS, are given in separate SHMs plotted next to each other (Figure 3). The color and feature mappings are defined by the corresponding color key and legend plot on the left and right, respectively.4 The expression values for the gene range from 0-5, while the color key on the left has a range from 0-2500. This is confusing and should be corrected.
SHMs for multiple genes can be plotted by providing the corresponding gene IDs under the ID argument as a character vector. The spatial_hm function will then sequentially arrange the SHMs for each gene in a single composite plot. To facilitate comparisons of expression values across genes and/or conditions, the lay.shm parameter can be assigned 'gene' or 'con', respectively. For instance, in Figure 4 the SHMs of genes ENSG00000268433 and ENSG00000006047 are organised by condition in a horizontal view. This function is particularly useful when comparing gene families.
spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433', 'ENSG00000006047'), lay.shm='con', width=1, height=1, legend.r=1.5)
Figure 4: SHMs of two genes
The subplots are organised by “condition” through lay.shm argument.
To provide a high level of flexibility, the spatial_hm contains many arguments. An overview of the argument names and their utility is provide in Table 2.
| argument | description |
|---|---|
| svg.path | Path of aSVG |
| data | Input data of SummarizedExperiment (SE), data frame, or vector |
| sam.factor | Applies to SE. Column name of sample replicates in colData slot. Default is NULL |
| con.factor | Applies to SE. Column name of condition replicates in colData slot. Default is NULL |
| ID | A character vector of row items for plotting spatial heatmaps |
| col.com | A character vector of color components for building colour scale. Default is c(‘purple’, ‘yellow’, ‘blue’) |
| col.bar | ‘selected’ or ‘all’, the former means use values of ID to build the colour scale while the latter use all values in data. Default is ‘selected’. |
| bar.width | A numeric of colour bar width. Default is 0.7 |
| data.trans | ‘log2’, ‘exp2’, or NULL, ‘log2’ transforms data to log2 scale for plotting while ‘exp2’ to 2-base exponent. Default is NULL, no transformation. |
| tis.trans | A vector of aSVG features to be transparent. Default is NULL. |
| width, height | Two numerics of width and height of spatial heatmap plots repsectively. Default is 1, 1. |
| legend.r | The ratio aspect (width to height) of legend plot. Default is 1. |
| sub.title.size | The title size of each spatial heatmap subplot. Default is 11. |
| lay.shm | ‘gen’ or ‘con’, applies to multiple genes or conditions respectively. ‘gen’ means spatial heatmaps are organised by genes while ‘con’ organised by conditions. Default is ‘gen’ |
| ncol | The total column number of spatial heatmaps, not including legend plot. Default is 2. |
| sam.legend | ‘identical’, ‘all’, or a vector of samples/features in aSVG to show in legend plot. ‘identical’ only shows matching features while ‘all’ shows all features. |
| legend.ncol, legend.nrow | Two numbers of columns and rows of legend keys respectively. Default is NULL, NULL, since they are automatically set. |
| legend.position | the position of legend keys (‘none’, ‘left’, ‘right’,‘bottom’, ‘top’), or two-element numeric vector. Default is ‘bottom’. |
| legend.direction | Layout of keys in legends (‘horizontal’ or ‘vertical’). Default is NULL, since it is automatically set. |
| legend.key.size, legend.label.size | The size of legend keys and labels respectively. Default is 0.5 and 8 respectively. |
| line.size, line.color | The size and colour of all plogyon outlines respectively. Default is 0.2 and ‘grey70’ respectively. |
| verbose | TRUE or FALSE. Default is TRUE and the aSVG features not mapped are printed to R console. |
This section generates an SHM plot for mouse data from the Expression Atlas. The code in this subsection is very similar to the previous Human Brain example. For brevity reasons, the corresponding text explaining the code has been reduced to the bare minimum.
The chosen mouse RNA-Seq data compares tissue level gene expression across mammalian species (Merkin et al. 2012). The following searches the Expression Atlas for expression data from ‘heart’ and ‘Mus musculus’.
all.mus <- searchAtlasExperiments(properties="heart", species="Mus musculus")
## Searching for Expression Atlas experiments matching your query ...
## Query successful.
## Found 67 experiments matching your query.
Among the many matching entries, accession ‘E-MTAB-2801’ will be downloaded.
all.mus[7, ]
## DataFrame with 1 row and 4 columns
## Accession Species Type
## <character> <character> <character>
## 1 E-MTAB-2801 Mus musculus RNA-seq of coding RNA
## Title
## <character>
## 1 Strand-specific RNA-seq of nine mouse tissues
rse.mus <- getAtlasData('E-MTAB-2801')[[1]][[1]]
## Downloading Expression Atlas experiment summary from:
## ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-2801/E-MTAB-2801-atlasExperimentSummary.Rdata
## Successfully downloaded experiment summary object for E-MTAB-2801
The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.mus. The following returns only its first three rows.
colData(rse.mus)[1:3, ]
## DataFrame with 3 rows and 4 columns
## AtlasAssayGroup organism organism_part strain
## <character> <character> <character> <character>
## SRR594393 g7 Mus musculus brain DBA/2J
## SRR594394 g21 Mus musculus colon DBA/2J
## SRR594395 g13 Mus musculus heart DBA/2J
The following example shows how to download from the EBI SVG repository an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded in the previous subsection. As before the image is saved to a directory named ~/test.
if (!dir.exists('~/test')) dir.create('~/test')
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('Mus musculus'), keywords.any=TRUE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=FALSE)
To build this vignettes according to the R/Bioconductor package requirements, the following code section uses the aSVG file instance included in the spatialHeatmap package rather than the downloaded instance from the previous example.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=NULL, keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
## Accessing features...
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
Return the names of the matching aSVG files.
unique(feature.df$SVG)
## [1] "gallus_gallus.svg" "mus_musculus.male.svg"
The following selects mus_musculus.male.svg as the target aSVG, returns the first three rows of the resulting feature.df and also prints the unique set of all aSVG features.
feature.df <- subset(feature.df, SVG=='mus_musculus.male.svg')
feature.df[1:3, ]
## feature id SVG parent index index1
## 10 kidney UBERON_0002113 mus_musculus.male.svg LAYER_EFO 14 12
## 11 heart UBERON_0000948 mus_musculus.male.svg LAYER_EFO 51 49
## 12 path4204 path4204 mus_musculus.male.svg LAYER_OUTLINE 1 1
unique(feature.df[, 1])
## [1] "kidney" "heart"
## [3] "path4204" "aorta"
## [5] "circulatory system" "blood vessel"
## [7] "brown adipose tissue" "white adipose tissue"
## [9] "skin" "stomach"
## [11] "duodenum" "pancreas"
## [13] "spleen" "adrenal gland"
## [15] "colon" "small intestine"
## [17] "caecum" "jejunum"
## [19] "ileum" "esophagus"
## [21] "gall bladder" "parotid gland"
## [23] "submandibular gland" "lymph node"
## [25] "parathyroid gland" "tongue"
## [27] "Peyer’s patch" "prostate gland"
## [29] "vas deferens" "epididymis"
## [31] "testis" "seminal vesicle"
## [33] "penis" "urinary bladder"
## [35] "thymus" "femur"
## [37] "bone marrow" "cartilage"
## [39] "quadriceps femoris" "spinal cord"
## [41] "lung" "diaphragm"
## [43] "peripheral nervous system" "trachea"
## [45] "hindlimb" "trigeminal nerve"
## [47] "eye" "sciatic nerve"
## [49] "intestinal mucosa" "liver"
## [51] "brain" "skeletal muscle"
Obtain path of target aSVG on user system.
svg.mus <- system.file("extdata/shinyApp/example", "mus_musculus.male.svg", package="spatialHeatmap")
Import of custom target file defining simplified sample labels and experimental design. The following imports a sample target file that is included in this package. To instect its content, the first three rows of the target file are printed to the screen.
mus.tar <- system.file('extdata/shinyApp/example/target_mouse.txt', package='spatialHeatmap')
target.mus <- read.table(mus.tar, header=TRUE, row.names=1, sep='\t')
target.mus[1:3, ]
## AtlasAssayGroup organism organism_part strain
## SRR594393 g7 Mus musculus brain DBA.2J
## SRR594394 g21 Mus musculus colon DBA.2J
## SRR594395 g13 Mus musculus heart DBA.2J
unique(target.mus[, 3])
## [1] "brain" "colon" "heart" "kidney"
## [5] "liver" "lung" "skeletal muscle" "spleen"
## [9] "testis"
Load custom target data into colData slot.
colData(rse.mus) <- DataFrame(target.mus)
The raw RNA-Seq count are preprocessed with the following steps including (1) normalization, (2) aggregation of replicates, and (3) filtering of reliable expression data. The details of these steps are explained in the previous sub-section using data from human.
se.nor.mus <- norm_data(data=rse.mus, norm.fun='ESF', data.trans='log2') # Normalization
## Normalising: ESF
## type
## "ratio"
se.aggr.mus <- aggr_rep(data=se.nor.mus, sam.factor='organism_part', con.factor='strain', aggr='mean') # Aggregation of replicates
se.fil.mus <- filter_data(data=se.aggr.mus, sam.factor='organism_part', con.factor='strain', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL) # Filtering of genes with low counts and variance
The pre-processed expression data for gene ‘ENSMUSG00000000263’ is plotted in form of an SHM. In this case the plot includes expression data for 8 tissues across 3 mouse strains.
spatial_hm(svg.path=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000263'), height=0.5, legend.r=1.1, sub.title.size=9, ncol=3, tis.trans=c('skeletal muscle'), legend.nrow=4, line.size=0.2, line.color='grey70')
Figure 5: Mouse organ SHM
This is a multiple-layer image and skeletal muscle is set transparent to expose lung and heart.
This is a typical example to demonstrate the usage of tis.trans parameter, since this mouse organ image includes tissues in multiple layers and skelectal muscle covers lung and heart. In Figure 5, skelectal muscle is set transparent through tis.trans=c('skeletal muscle') so that lung and heart are exposed. By contrast, in Figure 6 tis.trans=NULL exposes skeletal muscle and lung and heart are covered.
Moreover, presence of too many tissues might affect the visual effects due to the messy polygon outlines. The line.size and line.color parameters are used to adjust the thickness and color of polygon outlines respectively and thus enhance the visualisation. In Figure 2, the default values of the 2 arguments are used.
spatial_hm(svg.path=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000263'), height=0.5, legend.r=1.1, sub.title.size=9, ncol=3, tis.trans=NULL, legend.ncol=2, line.size=0.2, line.color='grey70')
Figure 6: Mouse organ SHM
This is a multiple-layer image and skeletal muscle convers lung and heart.
In this example, the data come from developments of 7 chicken organs under 9 time points (Cardoso-Moreira et al. 2019), which is an RNA-seq analysis and accessed from EBI Expression Atlas.
The following process is similar to the Human Brain example, so explanation of code chunks is simplified to avoid lengthy text.
Search Expression Atlas for expression data derived from ‘heart’ and ‘gallus’.
all.chk <- searchAtlasExperiments(properties="heart", species="gallus")
## Searching for Expression Atlas experiments matching your query ...
## Query successful.
## Found 3 experiments matching your query.
Among the results, select ‘E-MTAB-6769’.
all.chk[3, ]
## DataFrame with 1 row and 4 columns
## Accession Species Type
## <character> <character> <character>
## 1 E-MTAB-6769 Gallus gallus RNA-seq of coding RNA
## Title
## <character>
## 1 Chicken RNA-seq time-series of the development of seven major organs
rse.chk <- getAtlasData('E-MTAB-6769')[[1]][[1]]
## Downloading Expression Atlas experiment summary from:
## ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-6769/E-MTAB-6769-atlasExperimentSummary.Rdata
## Successfully downloaded experiment summary object for E-MTAB-6769
A slice of the experiment design, which is stored in colData slot.
colData(rse.chk)[1:3, ]
## DataFrame with 3 rows and 8 columns
## AtlasAssayGroup organism strain genotype
## <character> <character> <character> <character>
## ERR2576379 g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576380 g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576381 g2 Gallus gallus Red Junglefowl wild type genotype
## developmental_stage age sex organism_part
## <character> <character> <character> <character>
## ERR2576379 embryo 10 day female brain
## ERR2576380 embryo 10 day female brain
## ERR2576381 embryo 10 day female cerebellum
Download aSVGs from EBI repository based on the tissues and species involved. An empty directory ~/test is suggested to save the downloaded files so as to avoid overwriting existing files.
# Make an empty directory "~/test" if not exist.
if (!dir.exists('~/test')) dir.create('~/test')
# Query aSVGs.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.any=TRUE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=FALSE)
As explained in the toy example, the target aSVG image has been included in this package, and will be used for plotting SHMs.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
## Accessing features...
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
feature.df
## feature id SVG parent index
## 1 heart UBERON_0000948 gallus_gallus.svg LAYER_EFO 4
## 2 kidney UBERON_0002113 gallus_gallus.svg LAYER_EFO 5
## 3 chicken_outline chicken_outline gallus_gallus.svg LAYER_OUTLINE 1
## 4 brain UBERON_0000955 gallus_gallus.svg LAYER_EFO 3
## 5 liver UBERON_0002107 gallus_gallus.svg LAYER_EFO 6
## 6 skeletal muscle organ UBERON_0014892 gallus_gallus.svg LAYER_EFO 7
## 7 colon UBERON_0001155 gallus_gallus.svg LAYER_EFO 8
## 8 spleen UBERON_0002106 gallus_gallus.svg LAYER_EFO 9
## 9 lung UBERON_0002048 gallus_gallus.svg LAYER_EFO 10
## index1
## 1 2
## 2 3
## 3 1
## 4 1
## 5 4
## 6 5
## 7 6
## 8 7
## 9 8
Get the target aSVG path.
svg.chk <- system.file("extdata/shinyApp/example", "gallus_gallus.svg", package="spatialHeatmap")
Make a targets file based on the experiment design and features of interest in the aSVG. It is included in this package and part is shown below.
chk.tar <- system.file('extdata/shinyApp/example/target_chicken.txt', package='spatialHeatmap')
target.chk <- read.table(chk.tar, header=TRUE, row.names=1, sep='\t')
target.chk[1:3, ]
## AtlasAssayGroup organism strain genotype
## ERR2576379 g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576380 g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576381 g2 Gallus gallus Red Junglefowl wild type genotype
## developmental_stage age sex organism_part
## ERR2576379 embryo day10 female brain
## ERR2576380 embryo day10 female brain
## ERR2576381 embryo day10 female cerebellum
Use the targets file to replace the data frame in colData slot.
colData(rse.chk) <- DataFrame(target.chk)
All samples used for plotting SHMs.
unique(colData(rse.chk)[, 'organism_part'])
## [1] "brain" "cerebellum" "heart" "kidney" "ovary"
## [6] "testis" "liver"
All conditions used for plotting SHMs.
unique(colData(rse.chk)[, 'age'])
## [1] "day10" "day12" "day14" "day17" "day0" "day155" "day35" "day7"
## [9] "day70"
Pro-process data matrix: normalize, aggregate, filter. Genes with expression values larger than 5 (log2 unit) in at least 1% of all samples (pOA=c(0.01, 5)), and with coefficient of variance (CV) between 0.6 and 100 (CV=c(0.6, 100)) are retained.
# Normalise.
se.nor.chk <- norm_data(data=rse.chk, norm.fun='ESF', data.trans='log2')
## Normalising: ESF
## type
## "ratio"
# Aggregate replicates.
se.aggr.chk <- aggr_rep(data=se.nor.chk, sam.factor='organism_part', con.factor='age', aggr='mean')
# Filter genes with low variance and low counts.
se.fil.chk <- filter_data(data=se.aggr.chk, sam.factor='organism_part', con.factor='age', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL)
Plot SHMs.
spatial_hm(svg.path=svg.chk, data=se.fil.chk, ID='ENSGALG00000006346', legend.r=1.5, sub.title.size=9, ncol=3, legend.nrow=2)
## Enrties not mapped: cerebellum, ovary, testis
Figure 7: Example of plotting chicken organ SHMs
Liver in day10 is not plotted since this tissue in day10 in not available in the data matrix.
The SHM of gene ENSGALG00000006346 is plotted. It is intuitive that the profiles of liver, heart, and kidney are all higher in day17 than other days. Therefore, the important role of this gene in day10 is worth futher exploration. In day10 liver is blank, because in the expression matrix liver data is not availble for day10. This reflects the plotting algorithm that only matching samples between the data and SVG image are plotted.
In this example, the usage of argument ncol is exhibited on how to achieve optimal layout. There are 9 time conditions, so ncol=3 is set to make make full use of the space.
GEO is a another well-known public repository of array- and sequence-based data. To demonstrate the use of spatialHeatmap on this resource, the dataset GSE14502 is plotted on a shoot aSVG image. It a microarray data from a study of translatome variation of Arabidopsis thaliana (Arabidopsis) shoot and root tissues under control and hypoxia conditions (Mustroph et al. 2009), and is downloaded through GEOquery (S. Davis and Meltzer 2007).
Access the GEO dataset GSE14502 and convert it to SummarizedExperiment.
gset <- getGEO("GSE14502", GSEMatrix=TRUE, getGPL=TRUE)[[1]]
## Found 1 file(s)
## GSE14502_series_matrix.txt.gz
## Parsed with column specification:
## cols(
## .default = col_double(),
## ID_REF = col_character()
## )
## See spec(...) for full column specifications.
## File stored at:
## /tmp/RtmpD0dsLu/GPL198.soft
## Warning: 64 parsing failures.
## row col expected actual file
## 22747 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
## 22748 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
## 22749 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
## 22750 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
## 22751 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
## ..... ....... .................. ......... ............
## See problems(...) for more details.
se.sh <- as(gset, "SummarizedExperiment")
Use gene symbols to replace probes.
rownames(se.sh) <- make.names(rowData(se.sh)[, 'Gene.Symbol'])
A slice of the experiment design, which is stored in colData slot. The samples and conditions are included in the title column. In samples, promoter pGL2, pCO2, pSCR, pWOL labels root atrichoblast epidermis, root cortex meristematic zone, root endodermis, root vasculature respectively, and p35S labels root_total and shoot_total. There are 2 conditions: control and hypoxia.
colData(se.sh)[60:63, 1:4]
## DataFrame with 4 rows and 4 columns
## title geo_accession status
## <character> <character> <character>
## GSM362227 shoot_hypoxia_pGL2_rep1 GSM362227 Public on Oct 12 2009
## GSM362228 shoot_hypoxia_pGL2_rep2 GSM362228 Public on Oct 12 2009
## GSM362229 shoot_control_pRBCS_rep1 GSM362229 Public on Oct 12 2009
## GSM362230 shoot_control_pRBCS_rep2 GSM362230 Public on Oct 12 2009
## submission_date
## <character>
## GSM362227 Jan 21 2009
## GSM362228 Jan 21 2009
## GSM362229 Jan 21 2009
## GSM362230 Jan 21 2009
In this example, the aSVG image of Arabidopsis is made from Mustroph et al. (2009). Similarly, it is included in this packaged and thus can be queried locally. The instructions on how to make custom aSVG images are provided in the SVG tutorial.
Query the packaged aSVG files.
feature.df <- return_feature(feature=c('pGL2', 'pRBCS'), species=c('shoot'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
## Accessing features...
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
All matching aSVGs.
unique(feature.df$SVG)
## [1] "shoot_final.svg" "shoot_root_final.svg"
Select ‘shoot_final.svg’ for plotting spaital heatmaps.
feature.df <- subset(feature.df, SVG=='shoot_final.svg')
# A slice of the feature data frame.
feature.df[1:3, ]
## feature id SVG parent index index1
## 1 shoot_pGL2 shoot_pGL2 shoot_final.svg container 2 2
## 2 shoot_pRBCS shoot_pRBCS shoot_final.svg container 3 3
## 3 g258 g258 shoot_final.svg container 1 1
Get path of ‘shoot_final.svg’.
svg.sh <- system.file("extdata/shinyApp/example", "shoot_final.svg", package="spatialHeatmap")
Make a targets file based on the title column in experiment design and features of interest in the aSVG. It is included in this package and part is shown below.
sh.tar <- system.file('extdata/shinyApp/example/target_arab.txt', package='spatialHeatmap')
target.sh <- read.table(sh.tar, header=TRUE, row.names=1, sep='\t')
target.sh[60:63, ]
## col.name samples conditions
## shoot_hypoxia_pGL2_rep1 GSM362227 shoot_pGL2 hypoxia
## shoot_hypoxia_pGL2_rep2 GSM362228 shoot_pGL2 hypoxia
## shoot_control_pRBCS_rep1 GSM362229 shoot_pRBCS control
## shoot_control_pRBCS_rep2 GSM362230 shoot_pRBCS control
All samples in targets file.
unique(target.sh[, 'samples'])
## [1] "root_total" "root_p35S" "root_pSCR" "root_pSHR"
## [5] "root_pWOL" "root_pGL2" "root_pSUC2" "root_pSultr2.2"
## [9] "root_pCO2" "root_pPEP" "root_pRPL11C" "shoot_total"
## [13] "shoot_p35S" "shoot_pGL2" "shoot_pRBCS" "shoot_pSUC2"
## [17] "shoot_pSultr2.2" "shoot_pCER5" "shoot_pKAT1"
All conditions in targets file.
unique(target.sh[, 'conditions'])
## [1] "control" "hypoxia"
Use the targets file to replace the data frame in colData slot.
colData(se.sh) <- DataFrame(target.sh)
The dataset GSE14502 is already normalised by RMA (Gautier et al. 2004), so the pro-processing only includes aggregation and filtering. Genes with expression values larger than 6 (log2 unit) in at least 3% of all samples (pOA=c(0.03, 6)), and with coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)) are retained.
# Aggregate replicates.
se.aggr.sh <- aggr_rep(data=se.sh, sam.factor='samples', con.factor='conditions', aggr='mean')
# Filter genes with low variance and low intensity.
se.fil.arab <- filter_data(data=se.aggr.sh, sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100), dir=NULL)
Plot SHMs.
spatial_hm(svg.path=svg.sh, data=se.fil.arab, ID=c("HRE2"), height=0.6, legend.nrow=3, legend.r=1.3, legend.key.size=0.3)
## Enrties not mapped: root_total, root_p35S, root_pSCR, root_pSHR, root_pWOL, root_pGL2, root_pSUC2, root_pSultr2.2, root_pCO2, root_pPEP, root_pRPL11C, shoot_total, shoot_p35S
Figure 8: SHMs of Arabidopsis shoot
Pre-defined tissue regions are colored by the expression profile of the target gene. The promoter pGL2, pRBCS, pCER5, pSultr2.2, pSUC2, pKAT1 label shoot trichomes, shoot photosynthetic cell, cotyledon and leaf epidermis, shootbundle sheath, shoot phloem companion cells, Cotyledon and leaf guard cells, respectively.
Figure 8 is the SHM of gene HRE2 under control and hypoxia. It is clear that this gene's exression profiles under control are lower than hypoxia across all the 5 tissues (pGL2, pRBCS, pCER5, pSUC2, pKAT1). Therefore, it can be hypothesised that hypoxia induces over-expression of HRE2 across the 5 tissues and thus HRE2 might be an important factor for Arabidopsis shoot to cope with hypoxia stress. The tissue pSultr2.2 is blank under hypoxia due to unavailability of its data under hypoxia in the data matrix.
jianhai_comment: From here to Shiny App section, slight changes are made.
The Matrix Heatmap is designed to supplement the core feature of SHM. It displays the target gene in the context of corresponding gene network module, so there is a process of gene modules identification.
Adjacency Matrix and Module Identification
The modules are identified by adj_mod. It first computes an adjacency matrix on the gene expression matrix then hierarchically clusters the adjacency matrix by using WGCNA (Langfelder and Horvath 2008) and flashClust (Langfelder and Horvath 2012). The clutersing includes 4 alternative sensitivity levels (ds=0, 1, 2, or 3). From 3 to 0, the sensitivity decreases and results in less modules with larger sizes. Since the interactive network functionality performs better on smaller modules, only ds of 3 and 2 are used. There is another parameter type for module identification: signed and unsinged. The former means both positive and negative adjacency between genes are used while the latter takes the absolute values of negative adjacency.
The function adj_mod returns a list containing an adjacency matrix and a data frame of module assignment. It is domenstrated on the Arabidopsis Shoot data.
adj.mod <- adj_mod(data=se.fil.arab)
The adjacency matrix is a measure of co-expression similarity between genes, where larger value denotes more similarity.
adj.mod[['adj']][1:3, 1:3]
## ndhA petL psaJ
## ndhA 1.0000000 0.5374043 0.6088355
## petL 0.5374043 1.0000000 0.7779227
## psaJ 0.6088355 0.7779227 1.0000000
The module assignment is a data frame. The first column is ds=2 while the second is ds=3. The numbers in each column are module labels with “0” meaning genes not assigned to any modules.
adj.mod[['mod']][1:3, ]
## 2 3
## ndhA 1 0
## petL 1 0
## psaJ 1 0
The matrix heatmap is implemented in function matrix_hm with 2 modes provided: static or interactive. Figure 9 is the static mode on gene HRE2. Setting static=FALSE launches the interactive mode, where users can zoom in and out by drawing a rectangle and by double clicking the heatmap, respectively.
matrix_hm(geneID="HRE2", data=se.fil.arab, adj.mod=adj.mod, angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(10, 6), static=TRUE, arg.lis1=list(offsetRow=0.1, offsetCol=0.1))
Figure 9: Matrix Heatmap
Rows are genes and columns are samples. The input gene is tagged by 2 black lines.
In Figure 9, the target gene is displayed in the gene module it belongs to, which is indicated by 2 black lines. The rows and columns are sorted by hierarchical clustering dendrograms. The expression matrix of this module is visualised without being scaled (scale="no"). It can be seen that the expression levels of this module is overall much higher in hypoxia than control, and therefore it could potentially be used to infer the hypoxia response mechanism in Arabidopsis.
The same target gene and module from matrix heatmap can also be displayed as a network. Similarly, the network can be dispayed in static or interactive mode.
Setting static=TRUE launches the static network. In Figure 10 Nodes are genes and edges are adjacencies between genes. The thicker edge denotes higher adjacency (co-expression similarity) while larger node indicates higher gene connectivity (sum of a gene's adjacency with all its direct neighbours). The target gene is labeled by ’_selected’.
network(geneID="HRE2", data=se.fil.arab, adj.mod=adj.mod, adj.min=0.75, vertex.label.cex=1.2, vertex.cex=2, static=TRUE)
Figure 10: Static network
Node size denotes gene connectivity while edge thickness stands for co-expression similarity.
Setting static=FALSE launches the interactive network. There is an interactive color bar to denote gene connectivity. The color ingredients must only be separated by comma, e.g. purple,yellow,blue, which means gene connectivity increases from purple to yellow. If too many edges (e.g.: > 300) are displayed, the network could get stuck. So the ‘Input an adjacency threshold to display the adjacency network.’ option sets a threthold to filter out weak edges. If not too many edges retained (e.g.: < 300), users can check ‘Yes’ under ‘Display or not?’, then the network would be responsive smoothly. To maintain acceptable performance, users are advised to choose a stringent threshold (e.g. 0.9) initially, then decrease the value gradually. The interactive feature allows users to zoom in and out, or drag a gene around. All the gene IDs in the network module are listed in ‘Select by id’ in decreasing order according to gene connectivity. Same with static mode, the target gene ID is appended ’_selected’.
If gene annotation is available in rowData slot and provided to ann argument, the annotation is seen by mousing over a node. In this example, Target.Description in rowData is provided to ann.
network(geneID="HRE2", data=se.fil.arab, ann='Target.Description', adj.mod=adj.mod, static=FALSE)
All the above functionality (SHM, interactive matrix heatmap, interactive network) is also combined into a web-browser based Shiny App, which takes advantage of the computational power of R and interactivity of the web. The main benefits of the Shiny App is combine all the utities in one interface and increase interactivity. On the left of this app is the menu. It includes pre-formatted ready-to-use examples, options to upload formatted data matrix and aSVG images, and instruction to use this app. On the right is the interactive interfacce, including Data Matrix, Spatial Heatmap, Matrix Heatmap, and Network. To use interactive features, there are paramters on the left menu to operate. Upon launched, the app automatically displays a pre-formatted example. A good practice to use this app is to follow steps in the menu rather than skipping steps. If unexpectation happens, the app webpage should be refreshed.
This app is launched by the function shiny_all without any parameters. Figure 11 is the screenshot of Spatial Heatmap.
shiny_all()
Figure 11: The snapshot of Shiny App
Left is the menu and right is the Spatial Heatmap.
The data matrix to upload is a data frame. If the data is a SummarizedExperiment class, the data matrix can be obtained by setting a directory path to dir in function filter_data. A folder local_mode_result/ is automatically created in the provided path, and the filtered data matrix is written to local_mode_result/processed_data.txt with column names in the scheme ’sample__condition’ (Table 1), which is a tab-separated file. If users want to see annotation by mousing over a node in the network, a column of gene annotation in rowData slot should be provided to ann, then the annotation is appended to the last column in processed_data.txt.
For example, in filter_data, setting dir='./' (current working directory) will output the filtered data matrix in ./local_mode_result/processed_data.txt, and setting ann='Target.Description' appends the annotation from rowData slot to the last column of processed_data.txt, which is ready to upload to the app.
se.fil.arab <- filter_data(data=se.aggr.sh, ann="Target.Description", sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100), dir='./')
jianhai_comment: the following Supplement section is substantially changed.
To plot SHMs, a pair of data (vector, data frame, SummarizedExperiment) and aSVG image are required. The most important step is to format the data and aSVG image so that target features in aSVG have matching counterparts in the data, since only the matching features in aSVG images are colored. This section explains details of data and aSVG setup that are not covered in the main vignette.
The accepted data classes include vector, data frame, or SummarizedExperiment (SE). A vector applies to several numeric values measured for a single item (e.g. gene), and data frame applies to more items assayed in several samples and/or several conditions (e.g. 2 samples under 2 conditions). By contrast, SE applies to experiments with many samples and many conditions. Formatting the data is essentially define samples and/or conditions.
Vector
In the case of vector, the numeric values are measured from different samples. If one or more conditions are provided, the samples and conditions should be connected by double undescore, i.e. in the form of ’sample__condition’. If no conditions are provided, all the samples are assumed to have same condition, which is the toy example.
Take 2 samples occipital lobe and parietal lobe from the toy example for instance and assume there are 2 conditions, condition1 and condition2. Select 5 random values, assign 4 of them to the 2 samples under the 2 conditions, and the last one to a not-mapped sample. Note the value names should be unique.
# Random numeric values.
vec <- sample(x=1:100, size=5)
# Give unique names to random values.
names(vec) <- c('occipital lobe__condition1', 'occipital lobe__condition2', 'parietal lobe__condition1', 'parietal lobe__condition2', 'notMapped')
vec
## occipital lobe__condition1 occipital lobe__condition2
## 95 42
## parietal lobe__condition1 parietal lobe__condition2
## 32 65
## notMapped
## 90
Plot SHMs.
spatial_hm(svg.path=svg.hum, data=vec, ID='toy', ncol=1, legend.r=1.2, sub.title.size=14)
Data Frame
In the case of data frame, numeric values are measured from different samples. Similarly, if one or more conditions are provided, the column names should be in the form of ’sample__condition’. If no conditions are provided, all the samples are assumed to have same condition.
Take the same samples and conditions in the vector case as example.
Make a numeric data frame of 20 rows and 5 columns. Name columns with the value names (each is unique) from above vector and rows with 20 genes (gene1, gene2, …, gene20).
# Make a numeric data frame.
df.test <- data.frame(matrix(sample(x=1:1000, size=100), nrow=20))
# Name the columns.
colnames(df.test) <- names(vec)
# Name the rows.
rownames(df.test) <- paste0('gene', 1:20)
# A slice of the data frame.
df.test[1:3, ]
## occipital_lobe__condition1 occipital_lobe__condition2
## gene1 831 244
## gene2 772 114
## gene3 28 925
## parietal_lobe__condition1 parietal_lobe__condition2 notMapped
## gene1 75 109 790
## gene2 847 316 739
## gene3 653 711 873
In the downstream interactive network, if users want to have a gene annotation by mousing over a node, a column of gene annotation can be appended to the data frame. For example, the 20 genes are annotated as ann1, ann2, …, ann20.
df.test$ann <- paste0('ann', 1:20)
df.test[1:3, ]
## occipital_lobe__condition1 occipital_lobe__condition2
## gene1 831 244
## gene2 772 114
## gene3 28 925
## parietal_lobe__condition1 parietal_lobe__condition2 notMapped ann
## gene1 75 109 790 ann1
## gene2 847 316 739 ann2
## gene3 653 711 873 ann3
Plot SHMs on gene1.
spatial_hm(svg.path=svg.hum, data=df.test, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)
SummarizedExperiment
In the following, the same samples and conditions in the above data frame are taken as example.
Formatting data of SummarizedExperiment (SE, Morgan et al. (2018)) is essentially to make a targets file (a data frame of column metadata). The targets file usually has at least 2 columns that specifies sample and condition replicates respectively, and should be added to the colData slot. The data matrix should have assayed items (e.g. genes) and sample/conditions in rows and columns respectively, and must be in the assay slot. The rowData slot can store a data frame of annotaions corresponding to rows in assay slot, but is not required.
To plot spaital heatmap successfully, the targets file should meet the following requirements.
It is a data frame and usually has at least one column of samples and one column of conditions. The rows correspond with columns in assay slot. If the condition column is not defined, the samples are assumped under same condition.
The sample column specifies sample replicates. It is crucial that replicate names of the same sample must be identical. Otherwise, they are treated as different samples. E.g. occipital lobe, occipital lobe are the same sample while occipital lobe1, occipital lobe2 are different samples.
The sample identifiers of interest must be identical with features of interest in aSVG respectively. It means even a dot, undescore, space, etc can make a difference and lead to target features not colored in SHMs. Since double underscore (__) is a reserved separator in spatialHeatmap, it cannot be used in sample or condition identifiers.
The condition column has the same requirement with the sample column. E.g. condition1, condition1 is same conditoin while condition1A, condition1B is treated as different conditions.
In the following example, occipital lobe has 2 conditions condition1 and condition2, and each condition has 2 replicates, so there are 4 assays for occipital lobe. The same applies to parietal lobe. Based on this experiment design, the corresponding targets file is made, where a row is an assay.
# Two samples.
sample <- c(rep('occipital lobe', 4), rep('parietal lobe', 4))
# Two conditions.
condition <- rep(c('condition1', 'condition1', 'condition2', 'condition2'), 2)
# Targets file.
target.test <- data.frame(sample=sample, condition=condition, row.names=paste0('assay', 1:8))
target.test
## sample condition
## assay1 occipital lobe condition1
## assay2 occipital lobe condition1
## assay3 occipital lobe condition2
## assay4 occipital lobe condition2
## assay5 parietal lobe condition1
## assay6 parietal lobe condition1
## assay7 parietal lobe condition2
## assay8 parietal lobe condition2
Make a random numeric data frame of 8 columns and 20 rows. Each column is an assay and each row is a gene's expression profile. Columns must correspond with rows in targets file, so column names are assigned assay1-8.
# Make a numeric data frame.
df.se <- data.frame(matrix(sample(x=1:1000, size=160), nrow=20))
# Name the rows.
rownames(df.se) <- paste0('gene', 1:20)
# Replace the default column names.
colnames(df.se) <- row.names(target.test)
# A slice of the data frame.
df.se[1:3, ]
## assay1 assay2 assay3 assay4 assay5 assay6 assay7 assay8
## gene1 205 876 410 257 144 973 852 787
## gene2 413 557 256 926 245 496 793 291
## gene3 743 809 958 189 669 837 764 740
se <- SummarizedExperiment(assays=df.se, colData=target.test)
se
## class: SummarizedExperiment
## dim: 20 8
## metadata(0):
## assays(1): ''
## rownames(20): gene1 gene2 ... gene19 gene20
## rowData names(0):
## colnames(8): assay1 assay2 ... assay7 assay8
## colData names(2): sample condition
Similarly, in the downstream interactive network, if users want to have a gene annotation by mousing over a node, a data frame of gene annotation can be added to rowData slot, i.e. the ann column in df.test.
rowData(se) <- df.test['ann']
In this simple example, the normalization and filtering process is left out, but replicates should be aggregated. In function aggr_rep, the sample and condition columns in targets file are concatenated with double underscore to form ’sample__condition’ replicates for aggregating.
se.aggr <- aggr_rep(data=se, sam.factor='sample', con.factor='condition', aggr='mean')
assay(se.aggr)[1:3, ]
## occipital_lobe__condition1 occipital_lobe__condition2
## gene1 540.5 333.5
## gene2 485.0 591.0
## gene3 776.0 573.5
## parietal_lobe__condition1 parietal_lobe__condition2
## gene1 558.5 819.5
## gene2 370.5 542.0
## gene3 753.0 752.0
Plot SHMs on gene1.
spatial_hm(svg.path=svg.hum, data=se.aggr, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)
The aSVG repository is from EBI Gene Expression Group, where the requirements on aSVG format are included. It contains aSVGs across different species and can be downloaded with funtion return_feature directly. If users cannot find a target aSVG in this repository, there is a step-by-step SVG tutorial to create custom aSVG images, which is developed by this project.
To change existing feature identifiers in aSVG, the function update_feature should be used. For testing purpose, an empty folder ~/test1 is created and a copy of the aSVG homo_sapiens.brain.svg packaged in spatialHeatmap is saved in there.
# Make an empty directory.
if (!dir.exists('~/test1')) dir.create('~/test1')
# Copy the "homo_sapiens.brain.svg" aSVG.
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap")
file.copy(from=svg.hum, to='~/test1', overwrite=FALSE)
Use feature and species keywords to query the aSVG and return existing features, which is a data frame.
feature.df <- return_feature(feature=c('frontal cortex'), species=c('homo sapiens', 'brain'), dir='~/test1', remote=FALSE, keywords.any=FALSE)
feature.df
Make a vector of new feature identifiers corresponding to every returned feature, e.g. replacing spaces with dots. This vector must be added to the first column of the feature data frame, since that is where update_feature looks for new features. Then features are updated with update_feature.
# A vector of new features.
f.new <- c('frontal.cortex', 'prefrontal.cortex')
# New features added to the first column of feature data frame.
feature.df.new <- cbind(featureNew=f.new, feature.df)
feature.df.new
# Update the features.
update_feature(feature=feature.df.new, dir='~/test1')
# Version Informaion
sessionInfo()
## R Under development (unstable) (2020-04-16 r78240)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.7.0
## LAPACK: /usr/local/lib/R-devel/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] GEOquery_2.57.0 ExpressionAtlas_1.17.0
## [3] xml2_1.3.2 limma_3.45.0
## [5] SummarizedExperiment_1.19.2 DelayedArray_0.15.1
## [7] matrixStats_0.56.0 Biobase_2.49.0
## [9] GenomicRanges_1.41.1 GenomeInfoDb_1.25.0
## [11] IRanges_2.23.4 S4Vectors_0.27.5
## [13] BiocGenerics_0.35.2 spatialHeatmap_0.99.0
## [15] knitr_1.28 BiocStyle_2.17.0
##
## loaded via a namespace (and not attached):
## [1] backports_1.1.7 rols_2.17.1 Hmisc_4.4-0
## [4] igraph_1.2.5 lazyeval_0.2.2 shinydashboard_0.7.1
## [7] splines_4.1.0 BiocParallel_1.23.0 ggplot2_3.3.0
## [10] digest_0.6.25 foreach_1.5.0 htmltools_0.4.0
## [13] GO.db_3.11.1 gdata_2.18.0 magrittr_1.5
## [16] checkmate_2.0.0 memoise_1.1.0 cluster_2.1.0
## [19] doParallel_1.0.15 readr_1.3.1 fastcluster_1.1.25
## [22] annotate_1.67.0 prettyunits_1.1.1 jpeg_0.1-8.1
## [25] colorspace_1.4-1 blob_1.2.1 xfun_0.13
## [28] dplyr_0.8.5 crayon_1.3.4 RCurl_1.98-1.2
## [31] jsonlite_1.6.1 genefilter_1.71.0 impute_1.63.0
## [34] survival_3.1-12 iterators_1.0.12 glue_1.4.1
## [37] gtable_0.3.0 zlibbioc_1.35.0 XVector_0.29.0
## [40] scales_1.1.1 DBI_1.1.0 edgeR_3.31.0
## [43] Rcpp_1.0.4.6 viridisLite_0.3.0 xtable_1.8-4
## [46] progress_1.2.2 htmlTable_1.13.3 gridGraphics_0.5-0
## [49] flashClust_1.01-2 foreign_0.8-78 bit_1.1-15.2
## [52] preprocessCore_1.51.0 Formula_1.2-3 rsvg_2.0
## [55] htmlwidgets_1.5.1 httr_1.4.1 gplots_3.0.3
## [58] RColorBrewer_1.1-2 acepack_1.4.1 ellipsis_0.3.1
## [61] farver_2.0.3 pkgconfig_2.0.3 XML_3.99-0.3
## [64] nnet_7.3-13 locfit_1.5-9.4 dynamicTreeCut_1.63-1
## [67] labeling_0.3 ggplotify_0.0.5 tidyselect_1.1.0
## [70] rlang_0.4.6 later_1.0.0 AnnotationDbi_1.51.0
## [73] visNetwork_2.0.9 munsell_0.5.0 tools_4.1.0
## [76] RSQLite_2.2.0 fastmap_1.0.1 evaluate_0.14
## [79] stringr_1.4.0 ggdendro_0.1-20 yaml_2.2.1
## [82] bit64_0.9-7 caTools_1.18.0 purrr_0.3.4
## [85] mime_0.9 compiler_4.1.0 rstudioapi_0.11
## [88] curl_4.3 plotly_4.9.2.1 png_0.1-7
## [91] tibble_3.0.1 geneplotter_1.67.0 stringi_1.4.6
## [94] highr_0.8 lattice_0.20-41 Matrix_1.2-18
## [97] vctrs_0.3.0 pillar_1.4.4 lifecycle_0.2.0
## [100] BiocManager_1.30.10 data.table_1.12.8 bitops_1.0-6
## [103] grImport_0.9-3 httpuv_1.5.2 R6_2.4.1
## [106] latticeExtra_0.6-29 bookdown_0.19 promises_1.1.0
## [109] KernSmooth_2.23-16 gridExtra_2.3 codetools_0.2-16
## [112] MASS_7.3-51.5 gtools_3.8.2 assertthat_0.2.1
## [115] DESeq2_1.29.1 GenomeInfoDbData_1.2.3 hms_0.5.3
## [118] grid_4.1.0 rpart_4.1-15 tidyr_1.0.3
## [121] rmarkdown_2.1 rvcheck_0.1.8 shiny_1.4.0.2
## [124] WGCNA_1.69 base64enc_0.1-3
This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.
Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9.
Chang, Winston, and Barbara Borges Ribeiro. 2018. Shinydashboard: Create Dashboards with ’Shiny’. https://CRAN.R-project.org/package=shinydashboard.
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. n.d. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.
Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (Geo) and Bioconductor.” Bioinformatics 14: 1846–7.
Gatto, Laurent. 2019. Rols: An R Interface to the Ontology Lookup Service. http://lgatto.github.com/rols/.
Gautier, Laurent, Leslie Cope, Benjamin M. Bolstad, and Rafael A. Irizarry. 2004. “Affy—analysis of Affymetrix Genechip Data at the Probe Level.” Bioinformatics 20 (3). Oxford, UK: Oxford University Press: 307–15. doi:10.1093/bioinformatics/btg405.
Keays, Maria. 2019. ExpressionAtlas: Download Datasets from Embl-Ebi Expression Atlas.
Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An R Package for Weighted Correlation Network Analysis.” BMC Bioinformatics 9 (December): 559.
———. 2012. “Fast R Functions for Robust Correlations and Hierarchical Clustering.” Journal of Statistical Software 46 (11): 1–17. http://www.jstatsoft.org/v46/i11/.
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for Rna-Seq Data with Deseq2.” Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8.
Maag, Jesper L V. 2018. “Gganatogram: An R Package for Modular Visualisation of Anatograms and Tissues Based on Ggplot2.” F1000Res. 7 (September): 1576.
McCarthy, Davis J., Chen, Yunshun, Smyth, and Gordon K. 2012. “Differential Expression Analysis of Multifactor Rna-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97.
Merkin, Jason, Caitlin Russell, Ping Chen, and Christopher B Burge. 2012. “Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues.” Science 338 (6114): 1593–9.
Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2018. SummarizedExperiment: SummarizedExperiment Container.
Mungall, Christopher J, Carlo Torniai, Georgios V Gkoutos, Suzanna E Lewis, and Melissa A Haendel. 2012. “Uberon, an integrative multi-species anatomy ontology.” Genome Biol. 13 (1): R5. doi:10.1186/gb-2012-13-1-r5.
Muschelli, John, Elizabeth Sweeney, and Ciprian Crainiceanu. 2014. “BrainR: Interactive 3 and 4D Images of High Resolution Neuroimage Data.” R J. 6 (1): 41–48.
Mustroph, Angelika, M Eugenia Zanetti, Charles J H Jang, Hans E Holtan, Peter P Repetti, David W Galbraith, Thomas Girke, and Julia Bailey-Serres. 2009. “Profiling Translatomes of Discrete Cell Populations Resolves Altered Cellular Priorities During Hypoxia in Arabidopsis.” Proc Natl Acad Sci U S A 106 (44): 18843–8.
Papatheodorou, Irene, Nuno A Fonseca, Maria Keays, Y Amy Tang, Elisabet Barrera, Wojciech Bazant, Melissa Burke, et al. 2018. “Expression Atlas: gene and protein expression across multiple studies and organisms.” Nucleic Acids Res. 46 (D1): D246–D251. doi:10.1093/nar/gkx1158.
Prudencio, Mercedes, Veronique V Belzil, Ranjan Batra, Christian A Ross, Tania F Gendron, Luc J Pregent, Melissa E Murray, et al. 2015. “Distinct Brain Transcriptome Profiles in C9orf72-Associated and Sporadic ALS.” Nat. Neurosci. 18 (8): 1175–82.
Waese, Jamie, Jim Fan, Asher Pasha, Hans Yu, Geoffrey Fucile, Ruian Shi, Matthew Cumming, et al. 2017. “EPlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology.” Plant Cell 29 (8): 1806–21.
Winter, Debbie, Ben Vinegar, Hardeep Nahal, Ron Ammar, Greg V Wilson, and Nicholas J Provart. 2007. “An ‘Electronic Fluorescent Pictograph’ Browser for Exploring and Analyzing Large-Scale Biological Data Sets.” PLoS One 2 (8): e718.